Data Collection for Quasar Detection

Goal: Gather images of quasars, non-quasar celestial objects, and quasar candidates.


In [ ]:
import urllib
from IPython.display import display, Image
import os
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
%matplotlib inline
from astropy import units as u
from astropy.coordinates import SkyCoord
from astropy.table import Table
from astroquery.sdss import SDSS
from astroquery.simbad import Simbad


C:\Users\plessas\AppData\Local\Continuum\Anaconda3\lib\site-packages\astroquery\sdss\__init__.py:28: UserWarning: Experimental: SDSS has not yet been refactored to have its API match the rest of astroquery (but it's nearly there).
  warnings.warn("Experimental: SDSS has not yet been refactored to have its API "

As as test, we will use AstroPy to get a nice image for the project page. We will use SIMBAD to find a random quasar.


In [ ]:
# Limit the number of results we get from our query.
Simbad.ROW_LIMIT = 1000

In [ ]:
result = Simbad.query_criteria('region(box,180d +30d, 8d +8d)', otype='QSO')

In [ ]:
result


Out[ ]:
<Table masked=True length=1000>
MAIN_IDRADECRA_PRECDEC_PRECCOO_ERR_MAJACOO_ERR_MINACOO_ERR_ANGLECOO_QUALCOO_WAVELENGTHCOO_BIBCODE
"h:m:s""d:m:s"masmasdeg
objectstr13str13int16int16float32float32int16str1str1object
SDSS J114407.30+322402.211 44 07.309+32 24 02.2877----0CO2014A&A...563A..54P
SDSS J121805.53+312036.012 18 05.538+31 20 36.027765.00056.00090CO2009yCat.2294....0A
SDSS J121336.30+284552.412 13 36.308+28 45 52.447761.00052.00090CO2009yCat.2294....0A
SDSS J121309.17+280517.612 13 09.174+28 05 17.6077----0CO2014A&A...563A..54P
7C 115021.30+332425.0011 52 51.9101+33 07 18.763990.4500.4000BR2011AJ....142...89P
2XMMi J121226.7+29111712 12 26.793+29 11 18.047761.00058.0000CO2009yCat.2294....0A
SDSS J115745.24+335503.511 57 45.248+33 55 03.5377----0CO2014A&A...563A..54P
SDSS J121722.46+273823.612 17 22.464+27 38 23.677768.00058.0000CO2009yCat.2294....0A
SDSS J114410.06+321506.911 44 10.0701+32 15 06.80788----0BO2009A&A...505..385A
.................................
CBS 5612 17 21.4161+30 56 30.6808893.80081.80090B2009A&A...505..385A
SDSS J115812.49+260746.411 58 12.494+26 07 46.487763.00053.00090CO2009yCat.2294....0A
SDSS J114954.42+271128.411 49 54.425+27 11 28.427763.00059.0000CO2009yCat.2294....0A
SDSS J115344.59+260429.911 53 44.592+26 04 29.997765.00055.00090CO2009yCat.2294....0A
SDSS J115200.24+311819.111 52 00.240+31 18 19.127710.00010.0000CO2012ApJS..203...21A
SDSS J120725.28+321530.412 07 25.278+32 15 30.467714.00013.0000CO2011yCat.2306....0A
SDSS J121739.23+315727.412 17 39.235+31 57 27.447760.00058.0000CO2009yCat.2294....0A
SDSS J120351.15+315619.012 03 51.159+31 56 19.0977----0CO2014A&A...563A..54P
SDSS J120908.69+304228.812 09 08.699+30 42 28.857764.00060.00090CO2009yCat.2294....0A
SDSS J120122.95+315326.912 01 22.949+31 53 26.9677----0CO2014A&A...563A..54P

In [ ]:
# Choose a random quasar.
qnumber = np.random.randint(0, 1000)
# Get that quasar's coordinates, and format them for SkyCoordinate
resultRA = result[qnumber]['RA'].split()
RA = '%sh%sm%ss' % (resultRA[0], resultRA[1], resultRA[2])
resultDEC = result[qnumber]['DEC'].split()
DEC = '%sd%sm%ss' % (resultDEC[0], resultDEC[1], resultDEC[2])
# Convert to a SkyCoordinate
QCoord = SkyCoord(RA, DEC, frame='icrs')
QCoord


Out[ ]:
<SkyCoord (ICRS): (ra, dec) in deg
    ( 176.2412375,  31.67712222)>

We will now get an image from the Sloan Digital Sky Survey. The following code follows this tutorial.


In [ ]:
impix = 1024
imsize = 12 * u.arcmin
cutoutbaseurl = 'http://skyservice.pha.jhu.edu/DR12/ImgCutout/getjpeg.aspx'
query_string = urllib.parse.urlencode(dict(
    ra=QCoord.ra.deg, dec=QCoord.dec.deg, width=impix, height=impix,
    scale=imsize.to(u.arcsec).value / impix))
url = cutoutbaseurl + '?' + query_string
urllib.request.urlretrieve(url, 'Quasar.jpg')


Out[ ]:
('Quasar.jpg', <http.client.HTTPMessage at 0x1f8e3e402b0>)

In [ ]:
display(Image('Quasar.jpg'))


Gathering Quasar Images

There is a list of 46420 detected quasars from the Penn State Center for Astrostatistics. We will use their SDSS_quasar.dat data set and the AstroPy python package.


In [ ]:
Quasars = pd.read_fwf('SDSS_quasar.dat')

In [ ]:
Quasars.head()


Out[ ]:
SDSS_J R.A. Dec. z u_mag sig_u g_mag sig_g r_mag sig_r ... sig_z Radio X-ray J_mag sig_J H_mag sig_H K_mag sig_K M_i
0 000009.26+151754.5 0.038605 15.298476 1.1986 19.921 0.042 19.811 0.036 19.386 0.017 ... 0.069 -1.0 -9.00 0.000 0.000 0.00 0.000 0.000 0.000 -25.085
1 000009.38+135618.4 0.039088 13.938447 2.2400 19.218 0.026 18.893 0.022 18.445 0.018 ... 0.033 -1.0 -9.00 0.000 0.000 0.00 0.000 0.000 0.000 -27.419
2 000009.42-102751.9 0.039269 -10.464428 1.8442 19.249 0.036 19.029 0.027 18.980 0.021 ... 0.047 0.0 -9.00 0.000 0.000 0.00 0.000 0.000 0.000 -26.459
3 000011.41+145545.6 0.047547 14.929353 0.4596 19.637 0.030 19.466 0.024 19.362 0.022 ... 0.047 -1.0 -9.00 0.000 0.000 0.00 0.000 0.000 0.000 -22.728
4 000011.96+000225.3 0.049842 0.040372 0.4790 18.237 0.028 17.971 0.020 18.025 0.019 ... 0.029 0.0 -1.66 16.651 0.136 15.82 0.149 14.821 0.111 -24.046

5 rows × 23 columns


In [ ]:
Quasars.tail()  # 46420 rows


Out[ ]:
SDSS_J R.A. Dec. z u_mag sig_u g_mag sig_g r_mag sig_r ... sig_z Radio X-ray J_mag sig_J H_mag sig_H K_mag sig_K M_i
46415 235949.46+150430.6 359.9560 15.075185 0.2977 19.094 0.025 18.966 0.023 18.668 0.016 ... 0.033 -1.0 -1.429 16.676 0.180 15.661 0.176 15.187 0.130 -22.286
46416 235953.44-093655.6 359.9726 -9.615454 0.3585 19.509 0.045 19.276 0.022 18.895 0.018 ... 0.039 0.0 -9.000 16.976 0.173 16.188 0.164 15.502 0.238 -22.549
46417 235956.72+135131.7 359.9863 13.858825 2.3826 20.010 0.040 19.427 0.027 19.217 0.018 ... 0.048 -1.0 -9.000 0.000 0.000 0.000 0.000 0.000 0.000 -26.665
46418 235958.21+005139.8 359.9925 0.861062 2.0382 19.256 0.034 19.004 0.021 18.794 0.017 ... 0.036 0.0 -9.000 0.000 0.000 0.000 0.000 0.000 0.000 -26.900
46419 235959.06-090944.0 359.9960 -9.162229 1.2845 18.403 0.021 18.373 0.015 18.139 0.024 ... 0.036 0.0 -9.000 0.000 0.000 0.000 0.000 0.000 0.000 -26.297

5 rows × 23 columns


In [ ]:
coord = SkyCoord(str(Quasars.iloc[1]['R.A.']) + 'd',
                 str(Quasars.iloc[1]['Dec.']) + 'd', frame='icrs')

In [ ]:
impix = 120
imsize = 1 * u.arcmin
cutoutbaseurl = 'http://skyservice.pha.jhu.edu/DR12/ImgCutout/getjpeg.aspx'
query_string = urllib.parse.urlencode(dict(
    ra=coord.ra.deg, dec=coord.dec.deg, width=impix, height=impix,
    scale=imsize.to(u.arcsec).value / impix))
url = cutoutbaseurl + '?' + query_string
urllib.request.urlretrieve(url, 'Quasar_1.jpg')
display(Image('Quasar_1.jpg'))



In [ ]:
def get_image(coordinate, name, impix=120):
    """Downloads the image from the SDSS DR12 release as a impix pixel by impix pixel image.

    Parameters
    ----------
    coordinate : coordinate of the celestial object as a Sky Coordinate.
    name: The name string to save the image as. It will be saved as 'name.jpg'.

    """
    imsize = 1 * u.arcmin
    cutoutbaseurl = 'http://skyservice.pha.jhu.edu/DR12/ImgCutout/getjpeg.aspx'
    query_string = urllib.parse.urlencode(dict(
        ra=coord.ra.deg, dec=coord.dec.deg, width=impix, height=impix,
        scale=imsize.to(u.arcsec).value / impix))
    url = cutoutbaseurl + '?' + query_string
    urllib.request.urlretrieve(url, './Images/' + name + '.jpg')

In [ ]:
get_image(coord, 'test1')
# Worked successfully

In [ ]:
# Some data manipulation to get Sky Coordinates for each entry.
# The application of the SkyCoord function will take time.
QuasarLocs = pd.concat([Quasars['R.A.'].apply(lambda x: str(x) + 'd '),
                        Quasars['Dec.'].apply(lambda x: str(x) + 'd')], axis=1)
QuasarLocs['Coords'] = QuasarLocs[['R.A.', 'Dec.']].apply(
    lambda x: SkyCoord(x[0], x[1], frame='icrs'), axis=1)

In [ ]:
QuasarLocs.head()


Out[ ]:
R.A. Dec. Coords
0 0.038605d 15.298476d <SkyCoord (ICRS): (ra, dec) in deg\n ( 0.03...
1 0.039088d 13.938447d <SkyCoord (ICRS): (ra, dec) in deg\n ( 0.03...
2 0.039269d -10.464428d <SkyCoord (ICRS): (ra, dec) in deg\n ( 0.03...
3 0.047547000000000006d 14.929353d <SkyCoord (ICRS): (ra, dec) in deg\n ( 0.04...
4 0.049842000000000004d 0.040372000000000005d <SkyCoord (ICRS): (ra, dec) in deg\n ( 0.04...

In [ ]:
# We will now download these images from SDSS DR12
for i in range(46420):
    get_image(QuasarLocs['Coords'].iloc[i], name='Quasar_' + str(i))

Gathering Non-quasar Celestial Objects

We will use SIMBAD to find objects that are not Quasars or Quasar Candidates. We will sample 200 random regions in the SDSS footprint and take 500 objects from each region.


In [ ]:
# Limit the number of results we get from our query.
Simbad.ROW_LIMIT = 20000

In [ ]:
# We will stay in the 8h to 16h +0d to +60 footprint region of SDSS.
# Note that there are some regions in SDSS DR 12 and DR 13 outside of this range,
# but this range covers a majority of the footprint.
# As the box we form is 8d by 8d, we start at 124d and end at 236d for longitude,
# and start as +4d to +56d in latitude.
NonQuasars = pd.DataFrame()
randcoord = []
for i in range(200):
    randcoord.append(str(np.random.randint(128, 237)) +
                     'd +' + str(np.random.randint(4, 57)) + 'd')
    try:
        # For otype, QSO are quasars, Q? are quasar candidates, and LeQ are gravitationally lenses quasars.
        result = Simbad.query_criteria(
            'region(box,' + randcoord[i] + ', 4d +4d)', 'otype != QSO', 'otype != Q?', 'otype != LeQ')
        sample = result.to_pandas().sample(500)
        NonQuasars = pd.concat([NonQuasars, sample], axis=0)
        if i % 10 == 0:
            print('At attempt %s' % i)
    except:
        print('Attempt Failed... retrying')
        i = i - 1


At attempt [0]
At attempt [10]
At attempt [20]
At attempt [30]
At attempt [40]
At attempt [50]
At attempt [60]
Attempt Failed... retrying
At attempt [70]
At attempt [80]
At attempt [90]
At attempt [100]
At attempt [110]
At attempt [120]
At attempt [130]
At attempt [140]
At attempt [150]
At attempt [160]
Attempt Failed... retrying
At attempt [170]
At attempt [180]
At attempt [190]

In [ ]:
NonQuasars


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
4419 b'[TRB2013] 227.70742+05.81570' 15 10 49.786 +05 48 56.56 7.0 7.0 10.000000 10.000000 90.0 C O b'2013yCat.5139....0A'
78 b'2MASX J15190843+0725544' 15 19 08.437 +07 25 54.48 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
2478 b'[TCC97] ACO 2052 329' 15 14 04.6 +06 22 54 5.0 5.0 5000.000000 5000.000000 90.0 D b'1997A&AS..125..459T'
2412 b'PTF 12fuu' 15 04 40.39 +06 04 21.0 6.0 6.0 NaN NaN 0.0 D b'2012ATel.4290....1G'
4127 b'WiggleZ S15J151547006+05441318' 15 15 47.004 +05 44 13.17 7.0 7.0 202.000000 104.000000 0.0 C O b'2009yCat.2294....0A'
5836 b'SDSSCGA 184' 15 08 32.2 +05 50 35 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
1139 b'2MASX J15145416+0414434' 15 14 54.166 +04 14 43.50 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
6324 b'WiggleZ S15J151641226+05502728' 15 16 41.227 +05 50 27.29 7.0 7.0 117.000000 115.000000 90.0 C O b'2009yCat.2294....0A'
5113 b'[TRB2013] 227.62819+05.26153' 15 10 30.766 +05 15 41.51 7.0 7.0 38.000000 16.000000 0.0 C O b'2013yCat.5139....0A'
730 b'WiggleZ S15J151830812+04190651' 15 18 30.807 +04 19 06.30 7.0 7.0 74.000000 69.000000 0.0 C O b'2009yCat.2294....0A'
2859 b'LEDA 1273686' 15 18 46.974 +04 48 56.93 7.0 7.0 46.000000 45.000000 85.0 C O b'2009yCat.2294....0A'
5869 b'SDSS J151630.98+064720.7' 15 16 30.974 +06 47 21.76 7.0 7.0 NaN NaN 0.0 C b'2005AJ....129.1483L'
5242 b'SDSSCGB 3530.2' 15 07 24.437 +07 03 24.50 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
3268 b'WiggleZ S15J151114512+04065113' 15 11 14.519 +04 06 51.38 7.0 7.0 89.000000 52.000000 0.0 C O b'2009yCat.2294....0A'
6466 b'[BKB95] 1511+0722' 15 14.2 +07 11 3.0 3.0 NaN NaN 0.0 E b''
4157 b'BD+05 2991s' 15 17 24.2675 +04 58 03.296 8.0 8.0 46.849998 35.950001 90.0 B b'1998A&A...335L..65H'
1894 b'[TRB2013] 227.30498+05.74008' 15 09 13.196 +05 44 24.27 7.0 7.0 10.000000 10.000000 90.0 C O b'2013yCat.5139....0A'
703 b'SDSSCGB 54836.4' 15 04 56.222 +06 09 54.71 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
6192 b'[RGD2013] J151836.39+044929.78' 15 18 36.388 +04 49 29.77 7.0 7.0 12.000000 12.000000 90.0 C O b'2013yCat.5139....0A'
4972 b'LARCS a2055r08-1363' 15 18 26.619 +05 59 30.04 7.0 7.0 10.000000 10.000000 90.0 C O b'2013yCat.5139....0A'
3970 b'2MASX J15170524+0653160' 15 17 05.244 +06 53 16.05 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
4889 b'LEDA 84537' 15 14 55.5 +07 18 37 5.0 5.0 3000.000000 3000.000000 90.0 D b'1980ApJS...42..565D'
4670 b'WiggleZ S15J150712196+05485479' 15 07 12.1960 +05 48 54.790 11.0 11.0 NaN NaN 0.0 C O b'2010MNRAS.401.1429D'
4555 b'2MFGC 12313' 15 13 27.221 +07 45 02.32 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
948 b'SDSSCGB 60722.2' 15 14 11.648 +07 48 03.23 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
6014 b'NVSS J151516+060404' 15 15 16.60 +06 04 04.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
1428 b'[TCC97] ACO 2052 326' 15 18 43.3 +07 16 09 5.0 5.0 5000.000000 5000.000000 90.0 D b'1997A&AS..125..459T'
2722 b'WiggleZ S15J150601990+06052664' 15 06 01.991 +06 05 26.62 7.0 7.0 128.000000 107.000000 90.0 C O b'2009yCat.2294....0A'
1264 b'SDSS J151756.82+060009.4' 15 17 56.811 +06 00 09.31 7.0 7.0 130.000000 110.000000 89.0 B N b'2003yCat.2246....0C'
5896 b'WiggleZ S15J151715725+05205030' 15 17 15.722 +05 20 50.29 7.0 7.0 108.000000 78.000000 90.0 C O b'2009yCat.2294....0A'
... ... ... ... ... ... ... ... ... ... ... ...
1678 b'[SPD2011] 17637' 15 18 38.1 +50 23 04 5.0 5.0 NaN NaN 0.0 E O b'2011ApJ...736...21S'
2124 b'TYC 3488-370-1' 15 21 01.9167 +52 13 58.249 8.0 8.0 41.689999 32.049999 90.0 B b'1998A&A...335L..65H'
437 b'NVSS J151325+501043' 15 13 25.10 +50 10 43.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
837 b'LEDA 2339964' 15 15 52.8 +49 19 54 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
789 b'SDSSCGB 54512.1' 15 06 55.298 +49 45 04.44 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
347 b'SDSSCGB 41762.4' 15 07 41.778 +49 30 05.40 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
1476 b'NVSS J150252+515004' 15 02 51.20 +51 49 58.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
1787 b'NVSS J150453+503008' 15 04 53.0 +50 30 09 5.0 5.0 9000.000000 9000.000000 90.0 D R b'1998AJ....115.1693C'
202 b'SDSSCGB 27021' 15 07 10.9 +50 55 27 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
2267 b'GB1 1507+517' 15 08 38.2 +51 29 42 5.0 5.0 NaN NaN 0.0 D b'1972AcA....22..227M'
2150 b'LEDA 2409324' 14 58 58.2 +52 12 34 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
2041 b'LEDA 2335127' 15 19 55.593 +49 09 38.45 7.0 7.0 42.000000 42.000000 0.0 C O b'2009yCat.2294....0A'
2405 b'LEDA 2391614' 15 20 09.4 +51 17 15 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
2171 b'LEDA 2353010' 15 00 55.6 +49 46 29 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
1417 b'TYC 3488-622-1' 15 18 13.164 +51 33 16.23 7.0 7.0 267.000000 173.000000 90.0 B O b'2000A&A...355L..27H'
916 b'TYC 3484-1447-1' 15 00 29.775 +49 52 56.72 7.0 7.0 105.000000 67.000000 90.0 B O b'2000A&A...355L..27H'
2627 b'SDSS J151941.29+525542.3' 15 19 41.295 +52 55 42.37 7.0 7.0 9000.000000 59.000000 90.0 C O b'2009yCat.2294....0A'
442 b'SDSS J151600.25+524355.8' 15 16 00.26 +52 43 55.8 6.0 6.0 90.000000 70.000000 90.0 B I b'2003yCat.2246....0C'
2099 b'SDSSCGB 74769.1' 15 15 25.130 +50 55 55.42 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2558 b'SDSSCGB 48665' 15 03 38.9 +52 49 21 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
2392 b'SDSSCGB 64880' 15 06 21.4 +52 33 23 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
232 b'SDSSCGB 38974.4' 15 00 23.244 +52 34 27.44 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
413 b'LEDA 2350385' 15 08 18.731 +49 41 13.23 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
987 b'StKM 1-1204' 15 02 25.744 +49 17 05.41 7.0 7.0 145.000000 90.000000 90.0 B O b'2000A&A...355L..27H'
1799 b'LEDA 2411016' 15 08 00.3 +52 16 41 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
157 b'SDSSCGB 51281.3' 15 05 34.541 +52 00 48.57 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2050 b'NVSS J151555+503517' 15 15 55.1 +50 35 14 5.0 5.0 1000.000000 1000.000000 90.0 C R b'1997ApJ...475..479W'
94 b'SDSS J151954.89+494751.5' 15 19 54.904 +49 47 51.58 7.0 7.0 90.000000 90.000000 90.0 B N b'2003yCat.2246....0C'
865 b'MCG+09-25-033' 15 12 17.963 +50 33 28.53 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
1596 b'[SPD2011] 7549' 14 57 40.1 +49 30 15 5.0 5.0 NaN NaN 0.0 E O b'2011ApJ...736...21S'

99000 rows × 11 columns


In [ ]:
# Checking for duplicates, which is a possibility in this process.
NonQuasars[NonQuasars.duplicated()]


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
1297 b'NSC J124834+550230' 12 48 34.83 +55 02 29.6 6.0 6.0 NaN NaN 0.0 D b'2003AJ....125.2064G'
167 b'SDSS J124958.66+554925.5' 12 49 58.666 +55 49 25.52 7.0 7.0 65.000000 48.000000 0.0 C O b'2009yCat.2294....0A'
1378 b'SDSSCGB 73374.4' 12 47 17.287 +53 58 39.34 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2421 b'SDSS J124802.92+552828.7' 12 48 02.920 +55 28 28.72 7.0 7.0 60.000000 44.000000 0.0 C O b'2009yCat.2294....0A'
1332 b'LEDA 2480646' 12 49 47.0 +54 47 52 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
1141 b'SDSSCGB 20128.1' 12 47 08.501 +55 28 05.19 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
697 b'HD 238160' 12 47 26.3265 +54 46 13.054 8.0 8.0 48.110001 26.209999 90.0 B b'1998A&A...335L..65H'
2865 b'SDSSCGB 47157' 12 46 13.1 +55 47 39 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
2054 b'SDSS J124840.34+541726.2' 12 48 40.34 +54 17 26.2 6.0 6.0 NaN NaN 0.0 C O b'2008ApJS..175..297A'
2215 b'LEDA 2490425' 12 47 19.5 +55 05 49 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
3225 b'SDSSCGB 20430.3' 14 28 20.396 +20 23 35.15 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2630 b'SDSS J143024.55+205609.7' 14 30 24.551 +20 56 09.73 7.0 7.0 8000.000000 68.000000 90.0 C O b'2009yCat.2294....0A'
1817 b'1RXS J143857.7+200422' 14 38 57.701 +20 04 22.51 7.0 7.0 NaN NaN 0.0 D b''
2669 b'FIRST J143859.5+175040' 14 38 59.636 +17 50 40.57 7.0 7.0 NaN NaN 0.0 D b'2012MNRAS.421.1569B'
1440 b'TYC 1482-1100-1' 14 30 31.341 +20 08 55.97 7.0 7.0 80.000000 79.000000 69.0 B O b'2000A&A...355L..27H'
791 b'LEDA 1541618' 14 38 48.5 +17 47 08 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
2439 b'LEDA 1635240' 14 28 35.3 +20 45 31 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
2607 b'SDSSCGB 12830.4' 14 31 21.556 +20 05 07.36 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
1353 b'LEDA 1628041' 14 39 52.703 +20 27 38.94 7.0 7.0 88.000000 65.000000 90.0 C O b'2009yCat.2294....0A'
3379 b'LEDA 1639466' 14 40 12.3 +20 57 08 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
153 b'LEDA 1634259' 14 29 23.7 +20 42 50 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
1192 b'SDSS J143328.49+201352.1' 14 33 28.477 +20 13 51.60 7.0 7.0 380.000000 350.000000 99.0 B N b'2003yCat.2246....0C'
2909 b'2MASX J14381862+1736335' 14 38 18.622 +17 36 33.57 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
214 b'SDSS J143138.85+202214.2' 14 31 38.856 +20 22 14.24 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2524 b'SDSSCGB 47349.3' 14 30 50.766 +20 33 02.58 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2552 b'NVSS J143901+171535' 14 39 01.20 +17 15 30.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
2018 b'SDSSCGB 7626.3' 14 29 21.819 +20 18 52.98 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2206 b'SDSSCGB 54754.3' 14 29 02.146 +20 32 23.20 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
1533 b'SDSS J143505.84+203708.9' 14 35 05.844 +20 37 08.92 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2528 b'SDSSCGB 6460' 14 31 27.4 +20 52 12 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
... ... ... ... ... ... ... ... ... ... ... ...
1 b'LINEAR 13271465' 15 03 48.53 +50 32 38.5 6.0 6.0 NaN NaN 0.0 C O b'2013ApJ...765..154D'
1254 b'SDSS J152014.32+495125.1' 15 20 14.326 +49 51 25.11 7.0 7.0 5000.000000 125.000000 90.0 C O b'2009yCat.2294....0A'
1442 b'[KOS87] 150332+515800' 15 05 04 +51 46.4 4.0 4.0 NaN NaN 0.0 E b'1987ApJ...314..493K'
1516 b'SDSS J152017.20+514332.9' 15 20 17.207 +51 43 32.92 7.0 7.0 2.000000 2.000000 90.0 C O b'2012ApJS..203...21A'
2353 b'SDSSCGB 28489.3' 15 04 27.378 +49 51 38.91 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
1414 b'SDSS J150647.02+490605.0' 15 06 47.020 +49 06 05.09 7.0 7.0 42.000000 40.000000 63.0 C O b'2009yCat.2294....0A'
2261 b'SBSS 1510+510' 15 12 11.422 +50 48 51.34 7.0 7.0 320.000000 280.000000 81.0 B N b'2003yCat.2246....0C'
2390 b'NVSS J150743+510805' 15 07 44.20 +51 08 02.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
2329 b'BPS BS 16083-0130' 15 11 52.243 +51 42 57.49 7.0 7.0 20.000000 18.000000 90.0 B O b'2012yCat.1322....0Z'
1880 b'HD 136595' 15 19 37.4134 +49 23 22.443 8.0 8.0 17.370001 16.110001 175.0 B b'1998A&A...335L..65H'
1784 b'NVSS J150521+512837' 15 05 22.10 +51 28 37.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
1937 b'NVSS J151805+515853' 15 18 05.70 +51 58 57.0 6.0 6.0 500.000000 500.000000 90.0 D b'1997ApJ...475..479W'
795 b'SDSS J151505.09+492732.5' 15 15 05.099 +49 27 32.54 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
1247 b'LEDA 2360970' 15 14 10.9 +50 02 12 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
1978 b'NVSS J151509+510115' 15 15 09.40 +51 01 15.1 6.0 6.0 700.000000 700.000000 90.0 D b'1998AJ....115.1693C'
1002 b'SDSSCGB 16759' 15 04 30.1 +51 07 22 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
1422 b'SDSSCGB 58689.1' 15 01 06.789 +49 12 02.42 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2337 b'LEDA 2420368' 15 14 10.8 +52 39 24 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
1672 b'SDSSCGB 28885' 15 15 42.8 +50 44 36 5.0 5.0 NaN NaN 0.0 D O b'2009MNRAS.395..255M'
1046 b'SDSSCGB 69742.1' 15 07 41.497 +51 55 30.01 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
174 b'TYC 3484-935-1' 15 04 09.704 +49 54 48.74 7.0 7.0 205.000000 134.000000 90.0 B O b'2000A&A...355L..27H'
1012 b'PTF 09ge' 14 57 03.10 +49 36 40.8 6.0 6.0 NaN NaN 0.0 D O b'2009PASP..121.1395L'
720 b'TYC 3488-599-1' 15 20 04.645 +51 33 58.75 7.0 7.0 259.000000 162.000000 90.0 B O b'2000A&A...355L..27H'
2124 b'TYC 3488-370-1' 15 21 01.9167 +52 13 58.249 8.0 8.0 41.689999 32.049999 90.0 B b'1998A&A...335L..65H'
347 b'SDSSCGB 41762.4' 15 07 41.778 +49 30 05.40 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
2405 b'LEDA 2391614' 15 20 09.4 +51 17 15 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
2171 b'LEDA 2353010' 15 00 55.6 +49 46 29 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
413 b'LEDA 2350385' 15 08 18.731 +49 41 13.23 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
987 b'StKM 1-1204' 15 02 25.744 +49 17 05.41 7.0 7.0 145.000000 90.000000 90.0 B O b'2000A&A...355L..27H'
94 b'SDSS J151954.89+494751.5' 15 19 54.904 +49 47 51.58 7.0 7.0 90.000000 90.000000 90.0 B N b'2003yCat.2246....0C'

4330 rows × 11 columns


In [ ]:
# As duplicates were found, we will drop all but the first.
NonQuasars = NonQuasars.drop_duplicates(keep='first')

In [ ]:
# Reindexing
NonQuasars.reset_index(inplace=True)
NonQuasars.drop('index', axis=1, inplace=True)


C:\Users\plessas\AppData\Local\Continuum\Anaconda3\lib\site-packages\ipykernel\__main__.py:3: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
  app.launch_new_instance()

In [ ]:
# Saving a copy of the data to accompany the images.
NonQuasars.to_csv('NonQuasarsData.csv', index=False)

In [ ]:
NonQuasars.head()


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
0 b'[TRB2013] 227.70742+05.81570' 15 10 49.786 +05 48 56.56 7.0 7.0 10.0 10.0 90.0 C O b'2013yCat.5139....0A'
1 b'2MASX J15190843+0725544' 15 19 08.437 +07 25 54.48 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
2 b'[TCC97] ACO 2052 329' 15 14 04.6 +06 22 54 5.0 5.0 5000.0 5000.0 90.0 D NaN b'1997A&AS..125..459T'
3 b'PTF 12fuu' 15 04 40.39 +06 04 21.0 6.0 6.0 NaN NaN 0.0 D NaN b'2012ATel.4290....1G'
4 b'WiggleZ S15J151547006+05441318' 15 15 47.004 +05 44 13.17 7.0 7.0 202.0 104.0 0.0 C O b'2009yCat.2294....0A'

In [ ]:
NonQuasars.tail()  # 94670 rows


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
94665 b'LEDA 2411016' 15 08 00.3 +52 16 41 5.0 5.0 NaN NaN 0.0 D O b'2003A&A...412...45P'
94666 b'SDSSCGB 51281.3' 15 05 34.541 +52 00 48.57 7.0 7.0 NaN NaN 0.0 C O b'2009yCat.2294....0A'
94667 b'NVSS J151555+503517' 15 15 55.1 +50 35 14 5.0 5.0 1000.0 1000.0 90.0 C R b'1997ApJ...475..479W'
94668 b'MCG+09-25-033' 15 12 17.963 +50 33 28.53 7.0 7.0 NaN NaN 0.0 B I b'2006AJ....131.1163S'
94669 b'[SPD2011] 7549' 14 57 40.1 +49 30 15 5.0 5.0 NaN NaN 0.0 E O b'2011ApJ...736...21S'

In [ ]:
def RAtoICRS(RAValue):
    """Converts SIMBAD Right Ascent (RA) format to ICRS format.

    Parameters
    ----------
    RAValue : A SIMBAD Right Ascent value in "X Y Z" format for X hours, Y minutes, and Z seconds.

    """
    if len(RAValue.split()) == 1:
        return '%sh' % (RAValue.split()[0])
    elif len(RAValue.split()) == 2:
        return '%sh%sm' % (RAValue.split()[0], RAValue.split()[1])
    elif len(RAValue.split()) == 3:
        return '%sh%sm%ss' % (RAValue.split()[0], RAValue.split()[1], RAValue.split()[2])
    else:
        return np.nan()


def DECtoICRS(DECValue):
    """Converts SIMBAD Declination (DEC) format to ICRS format.

    Parameters
    ----------
    RAValue : A SIMBAD Declination value in "+X Y Z" format for X degrees, Y minutes, and Z seconds.

    """
    if len(DECValue.split()) == 1:
        return '%sd' % (DECValue.split()[0])
    elif len(DECValue.split()) == 2:
        return '%sd%sm' % (DECValue.split()[0], DECValue.split()[1])
    elif len(DECValue.split()) == 3:
        return '%sd%sm%ss' % (DECValue.split()[0], DECValue.split()[1], DECValue.split()[2])
    else:
        return np.nan()

In [ ]:
# Data manipulation to get Sky Coordinates for each entry.
# Note that SIMBAD gives the values separated as hours, minutes, seconds for RA and degrees, minutes, seconds for Dec
NonQuasarLocs = pd.concat([NonQuasars['RA'].apply(RAtoICRS),
                           NonQuasars['DEC'].apply(DECtoICRS)], axis=1)
NonQuasarLocs['Coords'] = NonQuasarLocs[['RA', 'DEC']].apply(
    lambda x: SkyCoord(x[0], x[1], frame='icrs'), axis=1)

In [ ]:
NonQuasarLocs.head()


Out[ ]:
RA DEC Coords
0 15h10m49.786s +05d48m56.56s <SkyCoord (ICRS): (ra, dec) in deg\n ( 227....
1 15h19m08.437s +07d25m54.48s <SkyCoord (ICRS): (ra, dec) in deg\n ( 229....
2 15h14m04.6s +06d22m54s <SkyCoord (ICRS): (ra, dec) in deg\n ( 228....
3 15h04m40.39s +06d04m21.0s <SkyCoord (ICRS): (ra, dec) in deg\n ( 226....
4 15h15m47.004s +05d44m13.17s <SkyCoord (ICRS): (ra, dec) in deg\n ( 228....

In [ ]:
# We will now download these images from SDSS DR12
for i in range(94670):
    get_image(NonQuasarLocs['Coords'].iloc[i], name='NonQ_' + str(i))

Gathering Quasar Candidates

We will now use SIMBAD to identify quasar candidates for analysis with our trained model.


In [ ]:
# As with the Non-quasar data, we will stay in the 8h to 16h +0d to +60 footprint
# region of SDSS. Note that there are some regions in SDSS DR 12 and DR 13 outside
# of this range, but this range covers a majority of the footprint.
# Due to timeout issues from SIMBAD, we will use smaller regions
# of width 10d by +6d to gather the candidates.
QuasarCandidates = pd.DataFrame()
for i in range(12):  # Separate longitude into 12 segments of length 10d
    for j in range(10):  # Separate latitude into 10 segments of length 6d
        try:
            # For otype Q? are quasar candidates.
            result = Simbad.query_criteria(
                'region(box,' + str(125 + 10 * i) + 'd +'
                + str(3 + 6 * j) + 'd' + ', 5d +3d)', 'otype = Q?')
            QuasarCandidates = pd.concat(
                [QuasarCandidates, result.to_pandas()], axis=0)
            if (i * 10 + j) % 20 == 0:
                print('At attempt %s' % (i * 10 + j))
        except:
            print('Attempt Failed at i=%s and j=%s.' % (i, j))


At attempt 0
At attempt 20
At attempt 40
At attempt 60
At attempt 80
At attempt 100

In [ ]:
QuasarCandidates.head()


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
0 b'SDSS J082207.76+034040.3' 08 22 07.762 +03 40 40.39 7.0 7.0 83.0 75.0 0.0 C O b'2009yCat.2294....0A'
0 b'4C 07.25' 08 30 05.3 +07 45 46 5.0 5.0 NaN NaN 0.0 D b''
1 b'SDSS J082847.31+090335.2' 08 28 47.312 +09 03 35.22 7.0 7.0 74.0 69.0 0.0 C O b'2009yCat.2294....0A'
2 b'NVSS J081429+090749' 08 14 29.085 +09 07 48.52 7.0 7.0 159.0 121.0 0.0 C O b'2009yCat.2294....0A'
3 b'2MASS J08221083+0743435' 08 22 10.830 +07 43 43.59 7.0 7.0 170.0 100.0 90.0 B N b'2003yCat.2246....0C'

In [ ]:
QuasarCandidates.tail()


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
34 b'USNO-A2.0 1425-08394781' 15 53 37.759 +57 48 38.81 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
35 b'USNO-A2.0 1425-08369481' 15 46 26.011 +58 21 11.34 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
36 b'GALEX 2680919436624400066' 15 54 13.375 +58 11 33.90 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
37 b'USNO-A2.0 1425-08408588' 15 57 41.299 +56 01 23.63 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
38 b'USNO-A2.0 1425-08387451' 15 51 26.251 +57 53 54.28 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'

In [ ]:
# Checking for duplication
QuasarCandidates[QuasarCandidates.duplicated()]


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE

In [ ]:
# Reindexing
QuasarCandidates.reset_index(inplace=True)
QuasarCandidates.drop('index', axis=1, inplace=True)

In [ ]:
QuasarCandidates.tail()  # 5418 rows


Out[ ]:
MAIN_ID RA DEC RA_PREC DEC_PREC COO_ERR_MAJA COO_ERR_MINA COO_ERR_ANGLE COO_QUAL COO_WAVELENGTH COO_BIBCODE
5413 b'USNO-A2.0 1425-08394781' 15 53 37.759 +57 48 38.81 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
5414 b'USNO-A2.0 1425-08369481' 15 46 26.011 +58 21 11.34 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
5415 b'GALEX 2680919436624400066' 15 54 13.375 +58 11 33.90 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
5416 b'USNO-A2.0 1425-08408588' 15 57 41.299 +56 01 23.63 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'
5417 b'USNO-A2.0 1425-08387451' 15 51 26.251 +57 53 54.28 7.0 7.0 NaN NaN 0.0 D b'2007ApJ...664...53A'

In [ ]:
# Saving a copy of the data to accompany the images.
QuasarCandidates.to_csv('QuasarCandidatesData.csv', index=False)

In [ ]:
# Data manipulation to get Sky Coordinates for each entry.
# Note that SIMBAD gives the values separated as hours, minutes, seconds for RA and degrees, minutes, seconds for Dec
QuasarCandidateLocs = pd.concat([QuasarCandidates['RA'].apply(RAtoICRS),
                                 QuasarCandidates['DEC'].apply(DECtoICRS)], axis=1)
QuasarCandidateLocs['Coords'] = QuasarCandidateLocs[['RA', 'DEC']].apply(
    lambda x: SkyCoord(x[0], x[1], frame='icrs'), axis=1)

In [ ]:
QuasarCandidateLocs.head()


Out[ ]:
RA DEC Coords
0 08h22m07.762s +03d40m40.39s <SkyCoord (ICRS): (ra, dec) in deg\n ( 125....
1 08h30m05.3s +07d45m46s <SkyCoord (ICRS): (ra, dec) in deg\n ( 127....
2 08h28m47.312s +09d03m35.22s <SkyCoord (ICRS): (ra, dec) in deg\n ( 127....
3 08h14m29.085s +09d07m48.52s <SkyCoord (ICRS): (ra, dec) in deg\n ( 123....
4 08h22m10.830s +07d43m43.59s <SkyCoord (ICRS): (ra, dec) in deg\n ( 125....

In [ ]:
# We will now download these images from SDSS DR12
for i in range(5418):
    get_image(QuasarCandidateLocs['Coords'].iloc[i],
              name='QuasarCandidate_' + str(i))